Tolerating Changes in A Design P ychology Based Web Page Wrapper

نویسندگان

  • Yang Li
  • Zhan Cui
  • Hongji Yang
  • Hewijin Christine Jiau
چکیده

We introduce an innovative approach to wrapping semi-structured web pages in order to generate structured data. Unlike other work in this area based on physically specifying the location of information, our approach is based on human design psychology that captures more stable features across web pages, which we believe renders a more robust result in coping with changes in the web pages. In this paper, our focus is given to product advertisement domain and a set of design psychology principles for product advertisement and their implications for web page wrapper are presented which is based on a comprehensive survey on the web sites of major retailers in the U.K. A case study in mobile phone advertisement domain is used to evaluate our approach. keywords: web page wrapper, web maintenance, information retrieval, design psychology

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Wrapper Maintenance

A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either gramm...

متن کامل

A Multi-Page Data Extraction Service

We present a service-oriented architecture and a set of techniques for developing wrapper code generators, including the methodology of designing an effective wrapper program construction facility and a concrete implementation, called XWRAPComposer. Our wrapper generation framework has two unique design goals. First, we explicitly separate tasks of building wrappers that are specific to a Web s...

متن کامل

Adaptable Wrapper Generation for Web Page Format Change

In this paper, we propose an adaptive wrapper generator that can generate adaptable wrapper for adapting networked information sources (NIS) format changes. When NIS’s format changed, the adaptable wrapper can start recovery phase to discover the extraction rule of the new format of target NIS. The wrapper can automatically adapt the changes of content tag and accurately extract information. Th...

متن کامل

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of rob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002